# README for replication files for Boussalis, Chadefaux, Decadri and Salvi: "Public and Private Information in International Crises", International Studies Quarterly.


## Overview

The code in this replication package replicates all figures and tables in the paper's main text and appendix. Scholars interested in replicating the results can either simply run "RUN_ME.sh", which will automatically replicate the entire analysis (see "one-step instructions" section below). Alternatively, they can run it step by step using the procedure described in section "Instructions for step-by-step" below.


## Data availability and provenance

The diplomatic cables were collected from the historical collection "Documents diplomatiques français relatifs aux origines de la guerre de 1914" which is available on the website of the Library Of France.  The newspaper text from Le Figaro were also collected from the Library of France portal.  Both these corpora are provided in ./topic_models/corpora.  The cables corpus is stored as a Python pickle file, while the Le Figaro corpus is stored as a CSV file.  Both these files contain the original text, meta-data, and pre-processed text used in the subsequent topic modelling.  Details on the steps taken to pre-process the text are outlined in the appendix.  


## One-step Instructions for Replicators

For the replication of the entire analysis: in a terminal, change your directory to "replication_material" (cd /path/to/replication_material) and execute RUN_ME.sh by entering ./RUN_ME.sh in the command line. Note: You may need to make RUN_ME.sh executable (in a terminal: chmod +x RUN_ME.sh). This shell script has been successfully tested on Ubuntu 20.04 LTS (Focal Fossa) and Mac OS 10.15.

Alternatively, you may replicate the analysis step-by-step by following the instructions below.

The replicator should expect the entire process to take approximately 20 minutes on a personal computer.


## Step-by-step Instructions for Replicators

### Step-by-step replication of text analyses and Figures 1 and 2 in the main text and Figure 2 in the appendix.

Note: To replicate the topic modelling, first be sure to install the Python packages listed in requirements.txt. Also, remember to change the working directories in scripts 1-4 below to ./topic_models

1. Using Python 3.8 or greater, execute run_nmf_cables.py and run_nmf_newspaper.py (both files can be found in ./topic_models).  Please ensure that these scripts are run in the directory directly above /dynamic-nmf-model which contains the dynamic topic model that is a dependency for the aforementioned topic modelling scripts. The topic model scripts write output to ./topic_models/output_cables and ./topic_models/output_newspaper.  These output directories include:

*_dtmatrix_compact.csv: a document-topic weight matrix that also includes meta-data.  These files are used as input in the prediction section of the study.
*_dtm_results.pkl: a pickle file that contains the dynamic topic model results
*_keywords.csv: contains the topic number and top 20 highest weighted words per topic

2. Using R, Figure 1 in the main text can be generated by running ./topic_models/Figure1_paper.R.  This script will write a plot to ./descriptives
2. Using R, Figure 2 in the main text can be generated by running ./topic_models/Figure2_paper.R.  This will write a plot to ./predictive_validity 
3. Using R, Figure 1 in the appendix can be generated by running ./topic_models/Figure1_appendix.R.  This will write a plot to ./topic_similarity


### Step-by-step replication of in- and out-of-sample analyses (all other figures and tables in the main text and the appendix):

Using Rstudio (Rmarkdown), knit the file predictionReplication_notebook.Rmd available in folder Forecasting_Replication. The resulting figures and tables will be available in Forecasting_Replication/Results. The results are also compiled in a single pdf document titled predictionReplication_notebook.pdf


## License for Data

The code is licensed under a Creative Commons/CC-BY-NC/CC0 license.


